Captioning system

ABSTRACT

A captioning system is provided for providing captions for audio and/or video presentations. The captioning system can be used to provide text captions or audio descriptions of a video presentation. A user device is provided for the captioning system having a receiver operable to receive the captions together with synchronization information and a caption output circuit which is operable to output the captions at the appropriate timings defined by the synchronization information. The user device is preferably a portable hand-held device such as a mobile telephone, PDA or the like.

The present invention relates to a system and method and parts thereoffor providing captions for audio or video or multi-media presentations.The invention has particular though not exclusive relevance to theprovision of such a captioning system to facilitate the enjoyment of theaudio, video or multimedia presentation by people with sensorydisabilities.

A significant proportion of the population with hearing difficultiesbenefit from captions (in the form of text) on video images such as TVbroadcasts, video tapes, DVD and films. There are currently two types ofcaptioning systems available for video images—on-screen caption systemsand off-screen caption systems. In on-screen caption systems, thecaption text is displayed on-screen and it obscures part of the image.This presents a particular problem with cinema where there is areluctance for this to happen with general audiences. In the off-screencaption system, the text is displayed on a separate screen. Whilst thisovercomes some of the problems associated with the on-screen captionsystem, this solution adds additional cost and complexity and currentlyhas had poor takeup in cinemas for this reason.

In addition to text captioning systems for people with hearingdifficulties, there are also captioning systems which provide audiocaptions for people with impaired eyesight. In this type of audiocaptioning system, an audio description of what is being displayed isprovided to the user in a similar way to the way in which subtitles areprovided for the hard of hearing.

One aim of the present invention is to provide an alternative captioningsystem for the hard of hearing or an alternative captioning system forthose with impaired eyesight. The captioning system can also be used bythose without impaired hearing or eyesight, for example, to providedifferent language captions or the lyrics for songs.

According to one aspect, the present invention provides a captioningsystem comprising: a caption store for storing one or more sets ofcaptions each being associated with one or more presentations and eachset comprising at least one caption for playout at different timingsduring the associated presentation; and a user device having: (i) amemory for receiving and storing at least one set of captions for apresentation from the caption store; (ii) a receiver operable to receivesynchronisation information defining the timing during the presentationat which each caption in the received set of captions is to be output tothe user; and (iii) a caption output circuit operable to output to theassociated user, the captions in the received set of captions at thetimings defined by the synchronisation information.

In one embodiment, the captions are text captions which are output tothe user on a display associated with the user device. In anotherembodiment, the captions are audio signals which are output to the useras acoustic signals via a loudspeaker or earphone. The captioning systemcan be used, for example in cinemas, to provide captions to people withsensory disabilities to facilitate their understanding and enjoyment of,for example, films or other multimedia presentations.

The user device is preferably a portable hand-held device such as amobile telephone or personal digital assistant, as there are small andlightweight and most users have access to them. The use of such aportable computing device is also preferred since it is easy to adaptthe device to operate in the above manner by providing the device withappropriate software.

The caption store may be located in a remote server in which case theuser device is preferably a mobile telephone (or a PDA having wirelessconnectivity) as this allows for the direct connection between the userdevice and the remote server. Alternatively, the caption store may be akiosk at the venue at which the presentation is to be made, in whichcase the user can download the captions and synchronisation informationwhen they arrive. Alternatively, the caption store may simply be amemory card or smart-card which the user can insert into their userdevice in order to obtain the set of captions for the presentationtogether with the synchronisation information.

According to another aspect, the present invention provides a method ofmanufacturing a computer readable medium storing caption data andsynchronisation data for use in a captioning system, the methodcomprising: providing a computer readable medium; providing a set ofcaptions that is associated with a presentation which comprises aplurality of captions for playout at different timings during theassociated presentation; providing synchronisation information definingthe timing during the presentation at which each caption in the set ofcaptions is to be output to a user; receiving a computer readablemedium; recording computer readable data defining said set of captionsand said synchronisation information on said computer readable medium;and outputting the computer readable medium having the recorded captionand synchronisation data thereon.

Exemplary embodiments of the present invention will now be describedwith reference to the accompanying drawings, in which:

FIG. 1 is a schematic overview of a captioning system embodying thepresent invention;

FIG. 2 a is a schematic block diagram illustrating the main componentsof a user telephone that is used in the captioning system shown in FIG.1;

FIG. 2 b is a table representing the captions in a caption filedownloaded to the telephone shown in FIG. 2 a from the remote web servershown in FIG. 1;

FIG. 2 c is a representation of a synchronisation file downloaded to themobile telephone shown in FIG. 2 a from the remote web server shown inFIG. 1;

FIG. 2 d is a timing diagram illustrating the timing of synchronisationsignals and illustrating timing windows during which the mobiletelephone processes an audio signal from a microphone thereof;

FIG. 2 e is a signal diagram illustrating an exemplary audio signalreceived by a microphone of the telephone shown in FIG. 2 a and thesignature stream generated by a signature extractor forming part of themobile telephone;

FIG. 2 f illustrates an output from a correlator forming part of themobile telephone shown in FIG. 2 a, which is used to synchronise thedisplay of captions to the user with the film being watched;

FIG. 2 g schematically illustrates a screen shot from the telephoneillustrated in FIG. 2 a showing an example caption that is displayed tothe user;

FIG. 3 is a schematic block diagram illustrating the main components ofthe remote web server forming part of the captioning system shown inFIG. 1;

FIG. 4 is a schematic block diagram illustrating the main components ofa portable user device of an alternative embodiment; and

FIG. 5 is a schematic block diagram illustrating the main components ofthe remote server used with the portable user device shown in FIG. 5.

OVERVIEW

FIG. 1 schematically illustrates a captioning system for use inproviding text captions on a number of user devices (two of which areshown and labelled 1-1 and 1-2) for a film being shown on a screen 3within a cinema 5. The captioning system also includes a remote webserver 7 which controls access by the user devices 1 to captions storedin a captions database 9. In particular, in this embodiment, the userdevice 1-1 is a mobile telephone which can connect to the remote webserver 7 via a cellular communications base station 11, a switchingcentre 13 and the Internet 15 to download captions from the captionsdatabase 9. In this embodiment, the second user device 1-2 is a personaldigital assistant (PDA) that does not have cellular telephonetransceiver circuitry. This PDA 1-2 can, however, connect to the remoteweb server 7 via a computer 17 which can connect to the Internet 15. Thecomputer 17 may be a home computer located in the user's home 19 and maytypically include a docking station 21 for connecting the PDA 1-2 withthe computer 17.

In this embodiment, the operation of the captioning system using themobile telephone 1-1 is slightly different to the operation of thecaptioning system using the PDA 1-2. A brief description of theoperation of the captioning system using these devices will now begiven.

In this embodiment, the mobile telephone 1-1 operates to download thecaption for the film to be viewed at the start of the film. It does thisby capturing a portion of soundtrack from the beginning of the film,generated by speakers 23-1 and 23-2, which it processes to generate asignature that is characteristic of the audio segment. The mobiletelephone 1-1 then transmits this signature to the remote web server 7via the base station 11, switching station 13 and the Internet 15. Theweb server 7 then identifies the film that is about to begin from thesignature and retrieves the appropriate caption file together with anassociated synchronisation file which it transmits back to the mobiletelephone 1-1 via the Internet 15, switching centre 13 and base station11. After the caption file and the synchronisation file have beenreceived by the mobile telephone 1-1, the connection with the basestation 11 is terminated and the mobile telephone 1-1 generates anddisplays the appropriate captions to the user in synchronism with thefilm that is shown on the screen 3. In this embodiment, thesynchronisation data in the synchronisation file downloaded from theremote web server 7 defines the estimated timing of subsequent audiosegments within the film and the mobile telephone 1-1 synchronises theplayout of the captions by processing the audio signal of the film andidentifying the actual timing of those subsequent audio segments in thefilm.

In this embodiment, the user of the PDA 1-2 downloads the caption forthe film while they are at home 19 using their personal computer 17 inadvance of the film being shown. In particular, in this embodiment, theuser types in the name of the film that they are going to see into thepersonal computer 17 and then sends this information to the remote web 7server via the Internet 15. In response, the web server 7 retrieves theappropriate caption file and synchronisation file for the film which itdownloads to the user's personal computer 17. The personal computer 17then stores the caption file and the synchronisation file in the PDA 1-2via the docking station 21. In this embodiment, the subsequent operationof the PDA 1-2 to synchronise the display of the captions to the userduring the film is the same as the operation of the mobile telephone 1-1and will not, therefore, be described again.

Mobile Telephone

A brief description has been given above of the way in which the mobiletelephone 1-1 retrieves and subsequently plays out the captions for afilm to a user. A more detailed description will now be given of themain components of the mobile telephone 1-1 which are shown in blockform in FIG. 2 a. As shown, the mobile telephone 1-1 includes amicrophone 41 for detecting the acoustic sound signal generated by thespeakers 23 in the cinema 5 and for generating a correspondingelectrical audio signal. The audio signal from the microphone 41 is thenfiltered by a filter 43 to remove frequency components that are not ofinterest. The filtered audio signal is then converted into a digitalsignal by the analogue to digital converter (ADC) 45 and then stored inan input buffer 47. The audio signal written into the input buffer 47 isthen processed by a signature extractor 49 which processes the audio toextract a signature that is characteristic of the buffered audio.Various processing techniques can be used by the signature extractor 49to extract this signature. For example, the signature extractor maycarry out the processing described in WO 02/11123 in the name of ShazamEntertainment Limited. In this system, a window of about 15 seconds ofaudio is processed to identify a number of “fingerprints” along theaudio string that are representative of the audio. These fingerprintstogether with timing information of when they occur within the audiostring forms the above described signature.

As shown in FIG. 2 a, the signature generated by the signature extractoris then output to an output buffer 51 and then transmitted to the remoteweb server 7 via the antenna 53, a transmission circuit 55, a digital toanalogue converter (DAC) 57 and a switch 59.

As will be described in more detail below, the remote server 7 thenprocesses the received signature to identify the film that is playingand to retrieve the appropriate caption file and synchronisation filefor the film. These are then downloaded back to the mobile telephone 1-1and passed, via the aerial 53, reception circuit 61 and analogue todigital converter 63 to a caption memory 65. FIG. 2 b schematicallyillustrates the form of the caption file 67 downloaded from the remoteweb server 7. As shown, in this embodiment, the caption file 67 includesan ordered sequence of captions (caption(1) to caption(N)) 69-1 to 69-N.The caption file 67 also includes, for each caption, formattinginformation 71-1 to 71-N that defines the font, colour, etc. of the textto be displayed. The caption file 67 also includes, for each caption, atime value t₁ to t_(N) which defines the time at which the captionshould be output to the user relative to the start of the film. Finally,in this embodiment, the caption file 67 includes, for each caption 69, aduration Δt₁ to Δt_(N) which defines the duration that the captionshould be displayed to the user.

FIG. 2 c schematically represents the data within the synchronisationfile 73 which is used in this embodiment by the mobile telephone 1-1 tosynchronise the display of the captions with the film. As shown, thesynchronisation file 73 includes a number of signatures 75-1 to 75-Meach having an associated time value t₁ ^(s) to t_(M) ^(s) identifyingthe time at which the signature should occur within the audio of thefilm (again calculated from the beginning of the film).

In this embodiment, the synchronisation file 73 is passed to a controlunit 81 which controls the operation of the signature extracting unit 49and a sliding correlator 83. The control unit 81 also controls theposition of the switch 59 so that after the caption and synchronisationfiles have been downloaded into the mobile telephone 1-1, and the mobiletelephone 1-1 is trying to synchronise the output of the captions withthe film, the signature stream generated by the signature extractor 49is passed to the sliding correlator 83 via the output buffer 51 and theswitch 59.

Initially, before the captions are output to the user, the mobiletelephone 1-1 must synchronise with the film that is playing. This isachieved by operating the signature extractor 49 and the slidingcorrelator 83 in an acquisition mode, during which the signatureextractor extracts signatures from the audio received at the microphone41 which are then compared with the signatures 75 in the synchronisationfile 73, until it identifies a match between the received audio from thefilm and the signatures 75 in the synchronisation file 73. This matchidentifies the current position within the film, which is used toidentify the initial caption to be displayed to the user. At this point,the mobile telephone 1-1 enters a tracking mode during which thesignature extractor 49 only extracts signatures for the audio duringpredetermined time slots (or windows) within the film corresponding towhen the mobile telephone 1-1 expects to detect the next signature inthe audio track of the film. This is illustrated in FIG. 2 d which showsa time line (representing the time line for the film) together with thetimings t₁ ^(s) to t_(M) ^(s) corresponding to when the mobile telephone1-1 expects the signatures to occur within the audio track of the film.FIG. 2 d also shows a small time slot or window w₁ to w_(M) around eachof these time points, during which the signature extractor 49 processesthe audio signal to generate a signature stream which it outputs to theoutput buffer 51.

The generation of the signature stream is illustrated in FIG. 2 e whichshows a portion 77 of the audio track corresponding to one of the timewindows w_(j) and the stream 79 of signatures generated by the signatureextractor 49. In this embodiment, three signatures (signature (i),signature (i+1) and signature (i+2)) are generated for each processingwindow w. This is for illustration purposes only. In practice, many moreor less signatures may be generated for each processing window w.Further, whilst in this embodiment the signatures are generated fromnon-overlapping subwindows of the processing window w, the signaturesmay also be generated from overlapping subwindows. The way in which thiswould be achieved will be well known to those skilled in the art andwill not be described in any further detail.

In this embodiment, between adjacent processing windows w, the controlunit 51 controls the signature extractor 49 so that it does not processthe received audio. In this way, the processing performed by thesignature extractor 49 can be kept to a minimum.

During this tracking mode of operation, the sliding correlator 83 isoperable to correlate the generated signature stream in output buffer 51with the next signature 75 in the synchronisation file 73. Thiscorrelation generates a correlation plot such as that shown in FIG. 2 ffor the window of audio being processed. As shown in FIG. 2 d, in thisembodiment, the windows w_(j) are defined so that the expected timing ofthe signature is in the middle of the window. This means that the mobiletelephone 1-1 expects the peak output from the sliding correlator 83 tocorrespond to the middle of the processing window w. If the peak occursearlier or later in the window then the caption output timing of themobile telephone 1-1 must be adjusted to keep it in synchronism with thefilm. This is illustrated in FIG. 2 f which shows the expected time ofthe signature t_(s) appearing in the middle of the window and thecorrelation peak occurring δt seconds before the expected time. Thismeans that the mobile telephone 1-1 is slightly behind the film and theoutput timing of the subsequent captions must be brought forward tocatch up with the film. This is achieved by passing the δt value fromthe correlator 83 into a timing controller 85 which generates the timingsignal for controlling the time at which the captions are played out tothe user. As shown, the timing controller receives its timing referencefrom the mobile telephone clock 87. The generated timing signal is thenpassed to a caption display engine 89 which uses the timing signal toindex the caption file 67 in order to retrieve the next caption 69 fordisplay together with the associated duration information Δt andformatting information 71 which it then processes and outputs fordisplay on the mobile telephone display 91 via a frame buffer 93. Thedetails of how the caption 69 is generated and formatted are well knownto those skilled in the art and will not be described in any furtherdetail.

FIG. 2 g illustrates the form of an example caption which is output onthe display 91. FIG. 2 g also shows in the right hand side 95 of thedisplay 91 a number of user options that the user can activate bypressing appropriate function keys on the keypad 97 of the mobiletelephone 1-1. These include a language option 99 which allows the userto change the language of the caption 69 that is displayed. This ispossible, provided the caption file 67 includes captions 69 in differentlanguages. As the skilled man will appreciate, this does not involve anysignificant processing on the part of the mobile telephone 1-1, sinceall that is being changed is the text of the caption 69 that is to bedisplayed at the relevant timings. It is therefore possible topersonalise the captions for different users watching the same film. Theoptions also include an exit option 101 for allowing the user to exitthe captioning application being run on the mobile telephone 1-1.

Personal Digital Assistant

As mentioned above, the PDA 1-2 operates in a similar way to the mobiletelephone 1-1 except it does not include the mobile telephonetransceiver circuitry for connecting directly to the web server 7. Themain components of the PDA 1-2 are similar to those of the mobiletelephone 1-1 described above and will not, therefore, be describedagain.

Remote Web Server

FIG. 3 is a schematic block diagram illustrating the main components ofthe web server 7 used in this embodiment and showing the captionsdatabase 9. As shown, the web server 7 receives input from the Internet15 which is either passed to a sliding correlator 121 or to a databasereader 123, depending on whether or not the input is from the mobiletelephone 1-1 or from the PDA 1-2. In particular, the signature from themobile telephone 1-1 is input to the sliding correlator 121 where it iscompared with signature streams of all films known to the system, whichare stored in the signature stream database 125. The results of thesecorrelations are then compared to identify the film that the user isabout to watch. This film ID is then passed to the database reader 123.In response to receiving a film ID either from the sliding correlator121 or directly from a user device (such as the PC 17 or PDA 1-2), thedatabase reader 123 reads the appropriate caption file 67 andsynchronisation file 73 from the captions database 9 and outputs them toa download unit 127. The download unit 127 then downloads the retrievedcaption file 67 and synchronisation file 73 to the requesting userdevice 1 via the Internet 15.

As those skilled in the art will appreciate, a captioning system hasbeen described above for providing text captions for a film for displayto a user. The system does not require any modifications to the cinemaor playout system, but only the provision of a suitably adapted mobiletelephone 1-1 or PDA device 1-2 or the like. In this regard, it is notessential to add any additional hardware to the mobile telephone or thePDA, since all of the functionality enclosed in the dashed box 94 can beperformed by an appropriate software application run within the mobiletelephone 1-1 or PDA 1-2. In this case, the appropriate softwareapplication may be loaded at the appropriate time, e.g. when the userenters the cinema and in the case of the mobile telephone 1-1, isarranged to cancel the ringer on the telephone so that incoming calls donot disturb others in the audience. The above captioning system cantherefore be used for any film at any time. Further, since differentcaptions can be downloaded for a film, the system allows for contentvariation within a single screening. This facilitates, for example, theprovision of captions in multiple languages.

Modifications and Alternative Embodiments

In the above embodiment, a captioning system was described for providingtext captions on a display of a portable user device for allowing userswith hearing disabilities to understand a film being watched. Asdiscussed in the introduction of this application, the above captioningsystem can be modified to operate with audio captions (e.g. audiodescriptions of the film being displayed for people with impairedeyesight). This may be done simply by replacing the text captions 69 inthe caption file 67 that is downloaded from the remote server 7 withappropriate audio files (such as the standard .WAV or MP3 audio files)which can then be played out to the user via an appropriate headphone orearpiece. The synchronisation of the playout of the audio files could bethe same as for the synchronisation of the playout of the text captions.Alternatively synchronisation can be achieved in other ways. FIG. 4 is ablock diagram illustrating the main components of a mobile telephonethat can be used in such an audio captioning system. In FIG. 4, the samereference numerals have been used for the same components shown in FIG.2 a and these components will not be described again.

In this embodiment, the mobile telephone 1-1′ does not include thesignature extractor 49. Instead, as illustrated in FIG. 5, the signatureextractor 163 is provided in the remote web server 7′. In operation, themobile telephone 1-1′ captures part of the audio played out at thebeginning of the film and transmits this audio through to the remote webserver 7′. This audio is then buffered in the input buffer 161 and thenprocessed by the signature extractor 163 to extract a signaturerepresentative of the audio. This signature is then passed to acorrelation table 165 which performs a similar function to the slidingcorrelator 121 and signature stream database 125 described in the firstembodiment, to identify the ID for the film currently being played. Inparticular, in this embodiment, all of the possible correlations thatmay have been performed by the sliding correlator 121 and the signaturestream database 125 are carried out in advance and the results arestored in the correlation table 165. In this way, the signature outputby the signature extractor 163 is used to index this correlation tableto generate correlation results for the different films known to thecaptioning system. These correlation results are then processed toidentify the most likely film corresponding to the received audiostream. In this embodiment, the captions database 9 only includes thecaption files 67 for the different films, without any synchronisation 73files. In response to receiving the film ID either from the correlationtable 165 or from a user direct from a user device (not shown), thedatabase reader 123 retrieves the appropriate caption file 67 which itdownloads to the user device 1 via the download unit 127.

Returning to FIG. 4, in this embodiment, since the mobile telephone 1-1′does not include the signature extractor 49, synchronisation is achievedin an alternative manner. In particular, in this embodiment,synchronisation codes are embedded within the audio track of the film.Therefore, after the caption file 67 has been stored in the captionmemory 65, the control circuit 81 controls the position of the switch143 so that the audio signal input into the input buffer 47 is passed toa data extractor 145 which is arranged to extract the synchronisationdata that is embedded in the audio track. The extracted synchronisationdata is then passed to the timing controller 85 which controls thetiming at which the individual audio captions are played out by thecaption player 147 via the digital-to-analogue converter 149, amplifier151 and the headset 153.

As those skilled in the art will appreciate, various techniques can beused to embed the synchronisation data within the audio track. Theapplicant's earlier International applications WO 98/32248, WO 01/10065,PCT/GB01/05300 and PCT/GB01/05306 describe techniques for embedding datawithin acoustic signals and appropriate data extractors for subsequentlyextracting the embedded data. The contents of these earlierInternational applications are incorporated herein by reference.

In the above audio captioning embodiment, synchronisation was achievedby embedding synchronisation codes within the audio and detecting thesein the mobile telephone. As those skilled in the art will appreciate, asimilar technique may be used in the first embodiment. However,embedding audio codes within the soundtrack of the film is notpreferred, since it involves modifying in some way the audio track ofthe film. Depending on the data rates involved, this data may be audibleto some viewers which may detract from their enjoyment of the film. Thefirst embodiment is therefore preferred since it does not involve anymodification to the film or to the cinema infrastructure.

In embodiments where the synchronisation data is embedded within theaudio, the synchronisation codes used can either be the same coderepeated whenever synchronisation is required or it can be a unique codeat each synchronisation point. The advantage of having a unique code ateach synchronisation point is that a user who enters the film late orwho requires the captions only at certain points (for example a user whoonly rarely requires the caption) can start captioning at any pointduring the film.

In the embodiment described above with reference to FIGS. 4 and 5, thesignature extraction operation was performed in the remote web serverrather than in the mobile telephone. As those skilled in the art willappreciate, this modification can also be made to the first embodimentdescribed above, without the other modifications described withreference to FIGS. 4 and 5.

In the first embodiment described above, during the tracking mode ofoperation, the signature extractor only processed the audio track duringpredetermined windows in the film. As those skilled in the art willappreciate, this is not essential. The signature extractor could operatecontinuously. However, such an embodiment is not preferred since itincreases the processing that the mobile telephone has to perform whichis likely to increase the power consumption of the mobile telephone.

In the above embodiments, the mobile telephone or PDA monitored theaudio track of the film for synchronisation purposes. As those skilledin the art will appreciate, the mobile telephone or PDA device may beconfigured to monitor the video being displayed on the film screen.However, this is currently not preferred because it would require animage pickup device (such as a camera) to be incorporated into themobile telephone or PDA and relatively sophisticated image processinghardware and software to be able to detect the synchronisation points orcodes in the video. Further, it is not essential to detectsynchronisation codes or synchronisation points from the film itself.Another electromagnetic or pressure wave signal may be transmitted insynchronism with the film to provide the synchronisation points orsynchronisation codes. In this case, the user device would have toinclude an appropriate electromagnetic or pressure wave receiver.However, this embodiment is not preferred since it requires modificationto the existing cinema infrastructure and it requires the generation ofthe separate synchronisation signal which is itself synchronised to thefilm.

In the above embodiments, the captions and where appropriate thesynchronisation data, were downloaded to a user device from a remoteserver. As those skilled in the art will appreciate, the use of such aremote server is not essential. The caption data and the synchronisationdata may be pre-stored in memory cards or smart cards and distributed orsold at the cinema. In this case, the user device would preferably havean appropriate slot for receiving the memory card or smart-card and anappropriate reader for accessing the caption data and, if provided, thesynchronisation data. The manufacture of the cards would include thesteps of providing the memory card or smart-card and using anappropriate card writer to write the captions and synchronisation datainto the memory card or into a memory on the smart-card. Alternativelystill, the user may already have a smart-card or memory card associatedwith their user device which they simply insert into a kiosk at thecinema where the captions and, if applicable, the synchronisation dataare written into a memory on the card.

As a further alternative, the captions and synchronisation data may betransmitted to the user device from a transmitter within the cinema.This transmission may be over an electromagnetic or a pressure wavelink.

In the first embodiment described above, the mobile telephone had anacquisition mode and a subsequent tracking mode for controlling theplayout of the captions. In an alternative embodiment, the acquisitionmode may be dispensed with, provided that the remote server can identifythe current timing from the signature received from the mobiletelephone. This may be possible in some instances. However, if theintroduction of the film is repetitive then it may not be possible forthe web server to be able to provide an initial synchronisation.

In the first embodiment described above, the user devices downloaded thecaptions and synchronisation data from a remote web server via theinternet. As those skilled in the art will appreciate, it is notessential to download the files over the internet. The files may bedownloaded over any wide area or local area network. The ability todownload the caption files from a wide area network is preferred sincecentralised databases of captions may be provided for distribution overa wider geographic area.

In the first embodiment described above, the user downloaded captionsand synchronisation data from a remote web server. Although notdescribed, for security purposes, the caption file and thesynchronisation file are preferably encoded or encrypted in some way toguard against fraudulent use of the captions. Additionally, the captionsystem may be arranged so that it can only operate in cinemas or atvenues that are licensed under the captioning system. In this case, anappropriate activation code may be provided at the venue in order to“unlock” the captioning system on the user device. This activation maybe provided in human readable form so that the user has to key in thecode into the user device. Alternatively, the venue may be arranged totransmit the code (possibly embedded in the film) to an appropriatereceiver in the user device. In either case, the captioning systemsoftware in the user device would have an inhibitor that would inhibitthe outputting of the captions until it received the activation code.Further, where encryption is used, the activation code may be used aspart of the key for decrypting the captions.

The above embodiments have described text captioning systems and audiocaptioning systems for use in a cinema. As those skilled in the art willappreciate, these captioning systems may be used for providing captionsfor any radio, video or multi-media presentation. They can also be usedin the theatre or opera or within the user's home.

Various captioning systems have been described above which provide textor audio captions for an audio or a video presentation. The captions mayinclude extra commentary about the audio or video presentation, such asdirector's comments, explanation of complex plots, the names of actorsin the film or third party comments. The captions may also includeadverts for other products or presentations. In addition, the audiocaptioning system may be used not only to provide audio descriptions ofwhat is happening in the film, but also to provide a translation of theaudio track for the film. In this way, each listener in the film canlisten to the film in their preferred language. The caption system canalso be used to provide karaoke captions for use with standard audiotracks. In this case, the user would download the lyrics and thesynchronisation information which define the timing at which the lyricsshould be displayed and highlighted to the user.

In addition to the above, the captioning system described above may beprovided to control the display of video captions. For example, suchvideo captions can be used to provide sign language (either real imagesor computer generated images) for the audio in the presentation beinggiven.

In the above embodiments, the captions for the presentation to be madewere downloaded in advance for playout. In an alternative embodiment,the captions may be downloaded from the remote server by the user devicewhen they are needed. For example, the user device may download the nextcaption when it receives the next synchronisation code for the nextcaption.

In the caption system described above, a user downloads or receives thecaptions and the synchronisation information either from a web server orlocally at the venue at which the audio or visual presentation is to bemade. As those skilled in the art will appreciate, for applicationswhere the user has to pay to download or playout the captions, atransaction system is preferably provided to facilitate the collectionof the monies due. In embodiments where the captions are downloaded froma web server, this transaction system preferably forms part of or isassociated with the web server providing the captions. In this case, theuser can provide electronic payment or payment through credit card orthe like at the time that they download the captions. This is preferred,since it is easier to link the payment being made with the captions andsynchronisation information downloaded.

In the first embodiment described above, the ID for the film wasautomatically determined from an audio signature transmitted from theuser's mobile telephone. Alternatively, instead of transmitting theaudio signature, the user can input the film ID directly into thetelephone for transmission to the remote server. In this case, thecorrelation search of the signature database is not essential.

In the first embodiment described above, the user device processed thereceived audio to extract a signature characteristic of the film thatthey are about to watch. The processing that is preferred is theprocessing described in the Shazam Entertainment Ltd patent mentionedabove. However, as those skilled in the art will appreciate, other typesof encoding may be performed. The main purpose of the signatureextractor unit in the mobile telephone is to compress the audio togenerate data that is still representative of the audio from which theremote server can identify the film about to be watched. Various othercompression schemes may be used. For example, a GSM codec together withother audio compression algorithms may be used.

In the above embodiments in which text captions are provided, they weredisplayed to the user on a display of a portable user device. Whilstthis offers the simplest deployment of the captioning system, otheroptions are available. For example, the user may be provided with anactive or passive type head-up-display through which the user can watchthe film and on which the captions are displayed (active) or areprojected (passive) to overlay onto the film being watched. This has theadvantage that the user does not have to watch two separate displays. Apassive type of head-up-display can be provided, for example, byproviding the user with a pair of glasses having a beam splitter (e.g. a45° prism) on which the user can see the cinema screen and the screen oftheir user device (e.g. phone or PDA) sitting on their lap.Alternatively, instead of using a head-up-display, a separatetransparent screen may be erected in front of the user's seat and ontowhich the captions are projected by the user device or a seat-mountedprojector.

In the first embodiment described above, the caption file included atime ordered sequence of captions together with associated formattinginformation and timing information. As those skilled in the art willappreciate, it is not essential to arrange the captions in such of timesequential order. However, arranging them in this way reduces theprocessing involved in identifying the next caption to display. Further,it is not essential to have formatting information in addition to thecaption. The minimum information required is the caption information.Further, it is not essential that this be provided in a file as each ofthe individual captions for the presentation may be downloadedseparately. However, the above described format for the caption file ispreferred since it is simple and can easily be created using, forexample, a spreadsheet. This simplicity also provides the potential tocreate a variety of different caption content.

In embodiments where the user's mobile telephone is used to provide thecaptioning, the captioning system can be made interactive whereby theuser can interact with the remote server, for example interacting withadverts or questionaries before the film starts. This interaction can beimplemented using, for example, a web browser on the user device thatreceives URLs and links to other information on websites.

In the first embodiment described above, text captions were provided forthe audio in the film to be watched. These captions may include fullcaptions, subtitles for the dialogue only or subtitles at key parts ofthe plot. Similar variation may be applied for audio captions.

1. A captioning system for providing captions for a presentation to a user, the presentation including a wireless acoustic signal, the captioning system comprising: a caption store operable to store one or more sets of captions each set being associated with one or more presentations and each set comprising a plurality of captions for playout at different timings during the associated presentation; and a cellular telephone having: i) a first receiver configured to receive, from said caption store, at least one set of captions for storage in the cellular telephone or to receive a sequence of captions for a presentation to be made to a user associated with the cellular telephone; ii) a microphone configured to receive the wireless acoustic signal of the presentation and to generate a corresponding electrical signal; iii) a synchronizer configured to process the electrical signal obtained from the microphone corresponding to said acoustic signal of said presentation, to determine synchronization information for use in defining the timing during the presentation at which each caption is to be output to the user associated with the cellular telephone; iv) a caption output operable to output each received caption to the user associated with the cellular telephone; and v) a timing controller configured to determine the timing during the presentation at which each caption should be output based on the synchronization information determined by said synchronizer and configured to control said caption output so that each caption is output to said user at the determined timing.
 2. A system according to claim 1, wherein said captions include text and wherein said caption output is operable to output said captions to a display device associated with the cellular telephone for display to the user.
 3. A system according to claim 2, wherein said captions include formatting information for controlling the format of the text displayed on said display.
 4. A system according to claim 2, wherein each caption includes duration information defining the duration that the caption should be displayed to the user.
 5. A system according to claim 2, wherein said caption includes timing information defining the time at which the caption should be displayed to the user during the presentation.
 6. A system according to claim 1, wherein said presentation includes video.
 7. A system according to claim 1, wherein said caption store is formed in a memory card which is insertable into said cellular telephone and wherein said cellular telephone includes a reader for reading captions from said memory card when inserted therein.
 8. A system according to claim 1, wherein said caption store is provided in a computer system and wherein said cellular telephone includes a communication module for communicating with said computer system.
 9. A system according to claim 8, wherein said computer system is remote from said cellular telephone.
 10. A system according to claim 8, wherein said cellular telephone includes a housing and wherein said communication module is provided within said housing.
 11. A system according to claim 8, wherein said communication module is operable to communicate with said remote computer system using a wireless communication link.
 12. A system according to claim 1, wherein said synchronization information defines time points for one or more predetermined acoustic portions of the presentation.
 13. A system according to claim 1, wherein said receiver is configured to receive synchronization data associated with the set of captions, the synchronization data identifying expected time points of one or more predetermined acoustic portions of the presentation and wherein said synchronizer comprises a monitoring circuit operable to monitor said electrical signal obtained from the microphone to identify actual time points of said one or more predetermined acoustic portions of the presentation and wherein said synchronization information generated by said synchronizer comprises a difference between the actual time points identified by the monitoring circuit and the expected time points identified by said synchronization data.
 14. A system according to claim 13, wherein said synchronization data associated with the set of captions includes a signature for each of said one or more predetermined acoustic portions of the presentation, wherein said monitoring circuit is configured to process said electrical signal from the microphone and to generate a signature for a current portion of the presentation, wherein the synchronizer further comprises a comparator that is configured to compare the signature generated by the monitoring circuit with a signature of said synchronization data and wherein the cellular telephone has: a) an acquisition mode of operation in which the signature generated by the monitoring circuit is compared with each of said signatures of said synchronization data to identify a current position within said presentation; and b) a tracking mode of operation in which the signature generated by said monitoring circuit is compared with an expected one of the signatures of said synchronization data.
 15. A system according to claim 14, wherein during said tracking mode of operation, said monitoring circuit is operable to monitor said presentation during a predetermined time window around an expected time point defined by said synchronization data.
 16. A system according to claim 13, wherein said cellular telephone is configured to receive said synchronization data from said caption store.
 17. A system according to claim 1, wherein said synchronization information is embedded within said acoustic part of the presentation.
 18. The system according to claim 17, wherein said synchronization information comprises synchronization codes that have been added to an audio signal of the presentation so that they occur at different timings during the presentation.
 19. A system according to claim 18, wherein each synchronization code is unique to uniquely define the position in the presentation.
 20. A system according to claim 1, wherein said caption store includes a plurality of sets of captions for a plurality of different presentations.
 21. A system according to claim 20, wherein said cellular telephone is configured to process the electrical signal from the microphone corresponding to the acoustic part of the presentation to determine data for use in identifying the presentation therefrom and is configured to transmit the determined data to said caption store and wherein said caption store is configured to use said determined data to identify the presentation being made and to transmit the associated set of captions for the identified presentation to said first receiver.
 22. A system according to claim 21, wherein said determined data comprises data characteristic of a captured portion of the acoustic part of the presentation and is configured to transmit said characteristic data to said caption store, and wherein said caption store is operable to use said characteristic data to identify the presentation being made and to transmit the associated set of captions for the identified presentation to the cellular telephone.
 23. A system according to claim 1, wherein said presentation is given at a venue, wherein said venue is operable to provide an activation code, wherein said cellular telephone is configured to receive said activation code and further comprises an inhibitor for inhibiting the operation of said caption output unless said cellular telephone has received said activation code.
 24. A portable cellular telephone for use in a captioning system, the portable cellular telephone comprising: i) a first receiver configured to receive, from a caption store, at least one set of captions for storage in the cellular telephone or to receive a sequence of captions for a presentation to be made to a user associated with the cellular telephone; ii) a microphone configured to receive a wireless acoustic signal that forms part of the presentation and to generate a corresponding electrical signal; iii) a synchronizer configured to process the electrical signal obtained from the microphone corresponding to said acoustic signal of said presentation, to determine synchronization information for use in defining the timing during the presentation at which each caption is to be output to the user associated with the cellular telephone; iv) a caption output operable to output each received caption to the user associated with the cellular telephone; and v) a timing controller configured to determine the timing during the presentation at which each caption should be output based on the synchronization information determined by said synchronizer and configured to control said caption output so that each caption is output to said user at the determined timing.
 25. A non-transitory computer readable medium storing computer executable instructions for causing a general purpose computing device to operate as the portable cellular telephone of claim
 24. 26. A captioning system for providing captions for a presentation to a user, the presentation including a wireless acoustic signal, the captioning system comprising: means for storing one or more sets of captions each set being associated with one or more presentations and each set comprising a plurality of captions for play out at different timings during the associated presentation; and a portable cellular telephone having: i) means for receiving at least one set of captions for storage in the cellular telephone or for receiving a sequence of captions for a presentation to be made to a user associated with the cellular telephone; ii) means for receiving the wireless acoustic signal of the presentation and to generate a corresponding electrical signal; iii) a synchronizer configured to process the electrical signal obtained from the microphone corresponding to said acoustic signal of said presentation, to determine synchronization information for use in defining the timing during the presentation at which each caption is to be output to the user associated with the cellular telephone; iv) means for outputting each received caption to the user associated with the cellular telephone; and v) means for determining the timing during the presentation at which each caption should be output based on the synchronization information determined by said synchronizer and for controlling said output means so that each caption is output to said user at the determined timing.
 27. A system according to claim 1, wherein said first receiver is configured to receive said set of captions or said sequence of captions via a telephone network.
 28. A system according to claim 1, wherein said first receiver is configured to receive said set of captions or said sequence of captions over a wired communications link in advance of the presentation.
 29. A system according to claim 1, wherein said cellular telephone is configured to use said first receiver to download a next caption from said caption store when it detects a synchronization code in the electrical signal received from said microphone that corresponds to the acoustic part of the presentation.
 30. A system according to claim 1, wherein said caption store is provided by a remote server.
 31. A cellular telephone according to claim 24, wherein said captions include text for display and wherein said captions include formatting information for controlling the format of the displayed text.
 32. A cellular telephone according to claim 31, wherein each caption includes duration information defining a duration that the caption should be displayed on said display.
 33. A cellular telephone according to claim 31, wherein said caption includes timing information defining the time at which the caption should be displayed to the user during the presentation.
 34. A cellular telephone e according to claim 24, wherein said caption store is provided in a remote computer system, wherein said cellular telephone includes a communication module for communicating with said computer system and wherein said first receiver forms part of said communicating means.
 35. A cellular telephone according to claim 34, wherein said communication module is configured to communicate with said remote computer system using a wireless communication link.
 36. A cellular telephone according to claim 24, configured to receive synchronization data defining the timing during the presentation at which each caption is to be output to the user associated with the cellular telephone and wherein said synchronizer is configured to determine said synchronization information using the received synchronization data and the electrical signal from the microphone.
 37. A cellular telephone according to claim 36, wherein said synchronization data defines expected time points for one or more predetermined portions of the presentation.
 38. A cellular telephone according to claim 24, wherein said receiver is configured to receive synchronization data associated with the set of captions, the synchronization data identifying expected time points of one or more predetermined acoustic portions of the presentation and wherein said synchronizer comprises comprising a monitoring circuit operable to monitor said electrical signal obtained from the microphone to identify the actual time points of said one or more predetermined acoustic portions of the presentation and wherein said synchronization information generated by said synchronizer comprises a difference between the actual time points identified by the monitoring circuit and the expected time points identified by said synchronization data.
 39. A cellular telephone according to claim 38, wherein said synchronization data associated with the set of captions includes a signature for each of said one or more predetermined acoustic portions of the presentation, wherein said monitoring circuit is configured to process said electrical signal from the microphone and to generate a signature for a current portion of the presentation, wherein the synchronizer further comprises a comparator that is configured to compare the signature generated by the monitoring circuit with a signature of said synchronization data and wherein the cellular telephone has: (a) having an acquisition mode of operation in which the signature generated by said monitoring circuit is compared with each of said signatures of said synchronization data to identify a current position within said presentation; and (b) a tracking mode of operation in which the signature generated by said monitoring circuit is compared with an expected one of the signatures of said synchronization data.
 40. A cellular telephone according to claim 39, wherein during said tracking mode of operation, said monitoring circuit is configured to monitor said presentation during a predetermined time window around an expected time point defined by said synchronization data.
 41. A cellular telephone according to claim 24, wherein said synchronization information is embedded within said acoustic part of the presentation and wherein said synchronizer comprises a monitoring circuit configured to monitor the electrical signal from the microphone corresponding to the acoustic part of the presentation and to extract said synchronization information therefrom.
 42. A cellular telephone according to claim 41, wherein said synchronization information comprises synchronization codes occurring at different timings during the presentation.
 43. A device according to claim 42, wherein each synchronization code is unique to uniquely define the position in the presentation.
 44. A cellular telephone according to claim 24, wherein said caption store includes a plurality of sets of captions for a plurality of different presentations and wherein the cellular telephone is configured to process the electrical signal from the microphone corresponding to the acoustic part of the presentation to determine data for use in identifying the presentation therefrom and is configured to transmit the determined data to said caption store for use by the caption store to identify the presentation being made.
 45. A cellular telephone according to claim 44, wherein the determined data comprises data characteristic of a captured portion of the acoustic part of the presentation and configured to transmit said characteristic data to said caption store for use by the caption store to identify the presentation.
 46. A method of providing captions for a presentation to a user, the presentation including a wireless acoustic signal and the method comprising: storing, at a caption store, one or more sets of captions each being associated with one or more presentations and each comprising a plurality of captions for playout at different timings during the associated presentation; and at a portable cellular telephone: using a first receiver to receive, from said caption store, at least one set of captions for storage in the cellular telephone or to receive a sequence of captions for a presentation to be made to an associated user; using a microphone to receive the wireless acoustic signal of the presentation and to generate a corresponding electrical signal; processing the electrical signal obtained from the microphone corresponding to said acoustic signal of said presentation, to determine synchronization information for use in defining the timing during the presentation at which each caption is to be output to the user; outputting the captions to the associated user; and determining the timing during the presentation at which each caption should be output based on the determined synchronization information and controlling the outputting step so that each caption is output to the user at the determined timing. 